Statistical Analysis and Application of Ensemble Method on the Netflix Challenge
نویسندگان
چکیده
1. Introduction The Netflix Prize project is proposed by the Neflix Inc., in order to seek accurate predictions on movie ratings. As one group in the Stanford Netflix Prize team, our responsibility is to explore useful statistics and data curation in the training data set, and to explore ensemble methods for improving prediction accuracies. We imported the Netflix data into a MySQL database for data aggregation, and then the aggregated results can be analyzed using Matlab or C++ scripts. So far, we have finished multiple clustering analyses to the movies and the customers by the K-means clustering techniques learnt from class [1]. We clustered the movies by multiple interesting criteria, such as the number of ratings to a movie, the average ratings to a movie, time progression on monthly numbers of ratings and rating averages, and the probability of different ratings for a movie. The customers are clustered with similar criteria except the time progression because the monthly numbers of ratings and rating averages change from time to time, depending on the movies the customers watch in those months. After the training data have been properly clustered through various criteria, we used ensemble methods to effectively combine the advantages of various classifiers and obtain improved results.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملA Preprocessing Technique to Investigate the Stability of Multi-Objective Heuristic Ensemble Classifiers
Background and Objectives: According to the random nature of heuristic algorithms, stability analysis of heuristic ensemble classifiers has particular importance. Methods: The novelty of this paper is using a statistical method consists of Plackett-Burman design, and Taguchi for the first time to specify not only important parameters, but also optimal levels for them. Minitab and Design Expert ...
متن کاملA Fault Diagnosis Method for Automaton based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملA Fault Diagnosis Method for Automaton Based on Morphological Component Analysis and Ensemble Empirical Mode Decomposition
In the fault diagnosis of automaton, the vibration signal presents non-stationary and non-periodic, which make it difficult to extract the fault features. To solve this problem, an automaton fault diagnosis method based on morphological component analysis (MCA) and ensemble empirical mode decomposition (EEMD) was proposed. Based on the advantages of the morphological component analysis method i...
متن کاملMetalearning for DynamicIntegration in Ensemble Methods
Ensemble methods have been receiving an increasing amount of attention, especially because of their successful application to high visibility problems (e.g., the NetFlix prize). An important challenge in ensemble learning (EL) is the management of the set of models to ensure a high level of accuracy, particularly with large number of models and in highly dynamic environments [49]. One approach ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006